Potholes of New York City
Aurora Koch-Pongsema
Drivers, bikers, and pedestrians all hate potholes, and with fluctuating temperatures, heavy traffic, and a constant battle for public resources, the City of New York has to pick and choose which potholes to fix as well as when it can fix them. What affects how quickly a pothole is repaired? Are all boroughs treated equally? Do other factors, such as collisions related to potholes or distance from governmental offices affect repair time? This project explores these questions using data sets from NYC OpenData and methods learned in a course at Lehman College in Spring 2016: Computer Science 464 (Special Topics): Data Science, taught by Professor Katherine St. John.
I'm hosted with GitHub Pages.
GitHub Repository
Data Sets
Primary: 311 Service Requests from 2010 to Present, downloaded April 17, 2016.
This data set contains all the reports, requests, and complaints from New Yorkers that contact 311 from any source. For this project, I filtered it to only examine potholes by filtering it for created dates in 2015, a complaint type of 'Street Condition', and a descriptor of 'Pothole.' As this data set is constantly updated with new entries and status for existing entries, some updates for pothole reports created in 2015 may be out of date. For data types in this set, I focused on the created date, closed date, status, borough, and location (latitude and longitude) of each report in this range.
Secondary: NYPD Motor Vehicle Collisions, downloaded April 17, 2016.
This data set contains the 'Details of Motor Vehicle Collisions in New York City provided by the Police Department(NYPD).' Of particular interest in this data set are the 'Contributing Factor[s]' given for each collision, some of which specify 'Pavement Defective' as a reason for the collision. The location, date, and estimated time of the accidents are also included.
Techniques and Analysis
I combined mapping with statistical analysis reduce the dimensionality of the data and look for extreme outliers, first examining repair times. I
extracted the time between the creation of a pothole complaint and the closing date to see how long it took for a pothole to be
officially resolved (if it was) and its report closed. In doing this, I examined the mean, standard deviation, interquartile range to see
trends, and then plotted the appropriate data in both 2D plots (comparing time lengths) and plotting the latitude and longitude of those
outliers on a basemap background. In this first analysis, I reduced dimensionality by only considering potholes that were of closed status, and
also discarding potholes with negative or zero repair times.
Here, I plotted all potholes that were successfully closed with their created report date against their closed report date. In the first, blue graph, I looked at all data with a valid location and state of 'Closed.'
In the second, red graph, I threw out tuples with zero repair times.
For the second data set, I examined the reported reasons for collisions and extracted the ones that cited 'Pavement Defective' and had a valid location, selecting only those collisions that had been reported by the NYPD as being due to problems with the road surface.
Citations
Code and Inspiration
Grus, Joel. "Data Science from Scratch: First Principles with Python." O'Reilly Media, 30 April 2015. Original repository of code at https://github.com/joelgrus/data-science-from-scratch with additional code downloaded and used as of April/May 2016.
Data Sets
"311 Service Requests from 2010 to Present." City of New York, NYC OpenData. https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9 Web. 17 April 2016.
"NYPD Motor Vehicle Collisions." City of New York, NYC OpenData. https://nycopendata.socrata.com/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95 Web. 17 April 2016.